AITopics | state-action tuple

Collaborating Authors

state-action tuple

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d7f426ccbc6db7e235c57958c21c5dfa-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 15:15:20 GMT

function approximation, on-policy distribution, q-function, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.38)

Add feedback

version of our paper, we shall clarify the details in Section 3 (R2), and make intuition in the methods section much

Neural Information Processing SystemsAug-16-2025, 16:45:14 GMT

We thank the reviewers for the detailed comments, suggestions, and a positive assessment of our work. We will correct for color schemes in all figures (R1). We have also made captions of figures cleaner (R3). We have added a description of the setup to the paper. In Fig 5 (left), DisCor actually outperforms Unif( s,a) on these environments.

function approximation, make intuition, on-policy distribution, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.38)

Add feedback

LDM$^2$: A Large Decision Model Imitating Human Cognition with Dynamic Memory Enhancement

Wang, Xingjin, Li, Linjing, Zeng, Daniel

arXiv.org Artificial IntelligenceDec-13-2023

With the rapid development of large language models (LLMs), it is highly demanded that LLMs can be adopted to make decisions to enable the artificial general intelligence. Most approaches leverage manually crafted examples to prompt the LLMs to imitate the decision process of human. However, designing optimal prompts is difficult and the patterned prompts can hardly be generalized to more complex environments. In this paper, we propose a novel model named Large Decision Model with Memory (LDM$^2$), which leverages a dynamic memory mechanism to construct dynamic prompts, guiding the LLMs in making proper decisions according to the faced state. LDM$^2$ consists of two stages: memory formation and memory refinement. In the former stage, human behaviors are decomposed into state-action tuples utilizing the powerful summarizing ability of LLMs. Then, these tuples are stored in the memory, whose indices are generated by the LLMs, to facilitate the retrieval of the most relevant subset of memorized tuples based on the current state. In the latter stage, our LDM$^2$ employs tree exploration to discover more suitable decision processes and enrich the memory by adding valuable state-action tuples. The dynamic circle of exploration and memory enhancement provides LDM$^2$ a better understanding of the global environment. Extensive experiments conducted in two interactive environments have shown that our LDM$^2$ outperforms the baselines in terms of both score and success rate, which demonstrates its effectiveness.

apple, llm, state-action tuple, (15 more...)

arXiv.org Artificial Intelligence

2312.08402

Country:

Asia > Middle East > Republic of Türkiye (0.04)
Asia > China > Beijing > Beijing (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(2 more...)

Genre:

Research Report (0.84)
Overview (0.68)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

CLARE: Conservative Model-Based Reward Learning for Offline Inverse Reinforcement Learning

Yue, Sheng, Wang, Guanbo, Shao, Wei, Zhang, Zhaofeng, Lin, Sen, Ren, Ju, Zhang, Junshan

arXiv.org Artificial IntelligenceFeb-20-2023

This work aims to tackle a major challenge in offline Inverse Reinforcement Learning (IRL), namely the reward extrapolation error, where the learned reward function may fail to explain the task correctly and misguide the agent in unseen environments due to the intrinsic covariate shift. Leveraging both expert data and lower-quality diverse data, we devise a principled algorithm (namely CLARE) that solves offline IRL efficiently via integrating "conservatism" into a learned reward function and utilizing an estimated dynamics model. Our theoretical analysis provides an upper bound on the return gap between the learned policy and the expert policy, based on which we characterize the impact of covariate shift by examining subtle two-tier tradeoffs between the "exploitation" (on both expert and diverse data) and "exploration" (on the estimated dynamics model). We show that CLARE can provably alleviate the reward extrapolation error by striking the right "exploitation-exploration" balance therein. Extensive experiments corroborate the significant performance gains of CLARE over existing state-of-the-art algorithms on MuJoCo continuous control tasks (especially with a small offline dataset), and the learned reward is highly instructive for further learning (source code). The primary objective of Inverse Reinforcement Learning (IRL) is to learn a reward function from demonstrations (Arora & Doshi, 2021; Russell, 1998). In general, conventional IRL methods rely on extensive online trials and errors that can be costly or require a fully known transition model (Abbeel & Ng, 2004; Ratliff et al., 2006; Ziebart et al., 2008; Syed & Schapire, 2007; Boularias et al., 2011; Osa et al., 2018), struggling to scale in many real-world applications. To tackle this problem, this paper studies offline IRL, with focus on learning from a previously collected dataset without online interaction with the environment.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2302.04782

Country:

North America > United States > Arizona (0.04)
North America > United States > Ohio (0.04)
North America > United States > California > Yolo County > Davis (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback